AITopics | Negros Island Region

Collaborating Authors

Negros Island Region

HiligayNER: A Baseline Named Entity Recognition Model for Hiligaynon

Teves, James Ald, Cal, Ray Daniel, Villaluz, Josh Magdiel, Malolos, Jean, Magtira, Mico, Rodriguez, Ramon, Abisado, Mideth, Imperial, Joseph Marvin

arXiv.org Artificial IntelligenceOct-14-2025

The language of Hiligaynon, spoken predominantly by the people of Panay Island, Negros Occidental, and Soccsksargen in the Philippines, remains underrepresented in language processing research due to the absence of annotated corpora and baseline models. This study introduces HiligayNER, the first publicly available baseline model for the task of Named Entity Recognition (NER) in Hiligaynon. The dataset used to build HiligayNER contains over 8,000 annotated sentences collected from publicly available news articles, social media posts, and literary texts. Two Transformer-based models, mBERT and XLM-RoBERTa, were fine-tuned on this collected corpus to build versions of HiligayNER. Evaluation results show strong performance, with both models achieving over 80% in precision, recall, and F1-score across entity types. Furthermore, cross-lingual evaluation with Cebuano and Tagalog demonstrates promising transferability, suggesting the broader applicability of HiligayNER for multilingual NLP in low-resource settings. This work aims to contribute to language technology development for underrepresented Philippine languages, specifically for Hiligaynon, and support future research in regional language processing.

computational linguistic, information retrieval, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.10776

Country:

Asia > Philippines > Visayas > Negros Island Region > Province of Negros Occidental (0.24)
Asia > Philippines > Mindanao > Soccsksargen (0.24)
Europe > Austria > Vienna (0.14)
(15 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FilBench: Can LLMs Understand and Generate Filipino?

Miranda, Lester James V., Aco, Elyanah, Manuel, Conner, Cruz, Jan Christian Blaise, Imperial, Joseph Marvin

arXiv.org Artificial IntelligenceAug-6-2025

Despite the impressive performance of LLMs on English-based tasks, little is known about their capabilities in specific languages such as Filipino. In this work, we address this gap by introducing FilBench, a Filipino-centric benchmark designed to evaluate LLMs across a diverse set of tasks and capabilities in Filipino, Tagalog, and Cebuano. We carefully curate the tasks in FilBench to reflect the priorities and trends of NLP research in the Philippines such as Cultural Knowledge, Classical NLP, Reading Comprehension, and Generation. By evaluating 27 state-of-the-art LLMs on FilBench, we find that several LLMs suffer from reading comprehension and translation capabilities. Our results indicate that FilBench is challenging, with the best model, GPT-4o, achieving only a score of 72.23%. Moreover, we also find that models trained specifically for Southeast Asian languages tend to underperform on FilBench, with the highest-performing model, SEA-LION v3 70B, achieving only a score of 61.07%. Our work demonstrates the value of curating language-specific LLM benchmarks to aid in driving progress on Filipino NLP and increasing the inclusion of Philippine languages in LLM development.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.03523

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Serbia > Central Serbia > Belgrade (0.04)
Asia > Southeast Asia (0.04)
(16 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators

Gu, Feng, Li, Zongxia, Colon, Carlos Rafael, Evans, Benjamin, Mondal, Ishani, Boyd-Graber, Jordan Lee

arXiv.org Artificial IntelligenceMar-9-2025

Event annotation is important for identifying market changes, monitoring breaking news, and understanding sociological trends. Although expert annotators set the gold standards, human coding is expensive and inefficient. Unlike information extraction experiments that focus on single contexts, we evaluate a holistic workflow that removes irrelevant documents, merges documents about the same event, and annotates the events. Although LLM-based automated annotations are better than traditional TF-IDF-based methods or Event Set Curation, they are still not reliable annotators compared to human experts. However, adding LLMs to assist experts for Event Set Curation can reduce the time and mental effort required for Variable Annotation. When using LLMs to extract event variables to assist expert annotators, they agree more with the extracted variables than fully automated LLMs for annotation.

agreement, annotation, annotator, (13 more...)

arXiv.org Artificial Intelligence

2503.06778

Country:

North America > United States > New York > New York County > New York City (0.15)
Asia > Middle East > Jordan (0.05)
North America > United States > Maryland > Prince George's County > College Park (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry:

Law Enforcement & Public Safety > Terrorism (0.48)
Media > News (0.48)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Advancing Vehicle Plate Recognition: Multitasking Visual Language Models with VehiclePaliGemma

AlDahoul, Nouar, Tan, Myles Joshua Toledo, Tera, Raghava Reddy, Karim, Hezerul Abdul, Lim, Chee How, Mishra, Manish Kumar, Zaki, Yasir

arXiv.org Artificial IntelligenceDec-14-2024

License plate recognition (LPR) involves automated systems that utilize cameras and computer vision to read vehicle license plates. Such plates collected through LPR can then be compared against databases to identify stolen vehicles, uninsured drivers, crime suspects, and more. The LPR system plays a significant role in saving time for institutions such as the police force. In the past, LPR relied heavily on Optical Character Recognition (OCR), which has been widely explored to recognize characters in images. Usually, collected plate images suffer from various limitations, including noise, blurring, weather conditions, and close characters, making the recognition complex. Existing LPR methods still require significant improvement, especially for distorted images. To fill this gap, we propose utilizing visual language models (VLMs) such as OpenAI GPT4o, Google Gemini 1.5, Google PaliGemma (Pathways Language and Image model + Gemma model), Meta Llama 3.2, Anthropic Claude 3.5 Sonnet, LLaVA, NVIDIA VILA, and moondream2 to recognize such unclear plates with close characters. This paper evaluates the VLM's capability to address the aforementioned problems. Additionally, we introduce ``VehiclePaliGemma'', a fine-tuned Open-sourced PaliGemma VLM designed to recognize plates under challenging conditions. We compared our proposed VehiclePaliGemma with state-of-the-art methods and other VLMs using a dataset of Malaysian license plates collected under complex conditions. The results indicate that VehiclePaliGemma achieved superior performance with an accuracy of 87.6\%. Moreover, it is able to predict the car's plate at a speed of 7 frames per second using A100-80GB GPU. Finally, we explored the multitasking capability of VehiclePaliGemma model to accurately identify plates containing multiple cars of various models and colors, with plates positioned and oriented in different directions.

large language model, machine learning, recognition, (20 more...)

arXiv.org Artificial Intelligence

2412.14197

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Asia > Malaysia (0.04)
North America > United States > New York (0.04)
Asia > Philippines > Visayas > Negros Island Region > Province of Negros Occidental > City of Bacolod (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.93)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

Xing, Junjie, He, Yeye, Zhou, Mengyu, Dong, Haoyu, Han, Shi, Zhang, Dongmei, Chaudhuri, Surajit

arXiv.org Artificial IntelligenceOct-15-2024

In this work, we propose Table-LLM-Specialist, or Table-Specialist for short, as a new self-trained fine-tuning paradigm specifically designed for table tasks. Our insight is that for each table task, there often exist two dual versions of the same task, one generative and one classification in nature. Leveraging their duality, we propose a Generator-Validator paradigm, to iteratively generate-then-validate training data from language-models, to fine-tune stronger \sys models that can specialize in a given task, without requiring manually-labeled data. Our extensive evaluations suggest that our Table-Specialist has (1) \textit{strong performance} on diverse table tasks over vanilla language-models -- for example, Table-Specialist fine-tuned on GPT-3.5 not only outperforms vanilla GPT-3.5, but can often match or surpass GPT-4 level quality, (2) \textit{lower cost} to deploy, because when Table-Specialist fine-tuned on GPT-3.5 achieve GPT-4 level quality, it becomes possible to deploy smaller models with lower latency and inference cost, with comparable quality, and (3) \textit{better generalizability} when evaluated across multiple benchmarks, since \sys is fine-tuned on a broad range of training data systematically generated from diverse real tables. Our code and data will be available at https://github.com/microsoft/Table-LLM-Specialist.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.12164

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Russia (0.14)
Asia > Russia (0.14)
(73 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Media (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CebuaNER: A New Baseline Cebuano Named Entity Recognition Model

Pilar, Ma. Beatrice Emanuela, Papas, Ellyza Mari, Buenaventura, Mary Loise, Dedoroy, Dane, Montefalcon, Myron Darrel, Padilla, Jay Rhald, Maceda, Lany, Abisado, Mideth, Imperial, Joseph Marvin

arXiv.org Artificial IntelligenceOct-1-2023

Despite being one of the most linguistically diverse groups of countries, computational linguistics and language processing research in Southeast Asia has struggled to match the level of countries from the Global North. Thus, initiatives such as open-sourcing corpora and the development of baseline models for basic language processing tasks are important stepping stones to encourage the growth of research efforts in the field. To answer this call, we introduce CebuaNER, a new baseline model for named entity recognition (NER) in the Cebuano language. Cebuano is the second most-used native language in the Philippines, with over 20 million speakers. To build the model, we collected and annotated over 4,000 news articles, the largest of any work in the language, retrieved from online local Cebuano platforms to train algorithms such as Conditional Random Field and Bidirectional LSTM. Our findings show promising results as a new baseline model, achieving over 70% performance on precision, recall, and F1 across all entity tags, as well as potential efficacy in a crosslingual setup with Tagalog.

computational linguistic, entity recognition, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2310.00679

Country:

Asia > Southeast Asia (0.24)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > Canada > Ontario > Toronto (0.04)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Localization and Classification of Parasitic Eggs in Microscopic Images Using an EfficientDet Detector

AlDahoul, Nouar, Karim, Hezerul Abdul, Kee, Shaira Limson, Tan, Myles Joshua Toledo

arXiv.org Artificial IntelligenceAug-3-2022

IPIs caused by protozoan and helminth parasites are among the most common infections in humans in LMICs. They are regarded as a severe public health concern, as they cause a wide array of potentially detrimental health conditions. Researchers have been developing pattern recognition techniques for the automatic identification of parasite eggs in microscopic images. Existing solutions still need improvements to reduce diagnostic errors and generate fast, efficient, and accurate results. Our paper addresses this and proposes a multi-modal learning detector to localize parasitic eggs and categorize them into 11 categories. The experiments were conducted on the novel Chula-ParasiteEgg-11 dataset that was used to train both EfficientDet model with EfficientNet-v2 backbone and EfficientNet-B7+SVM. The dataset has 11,000 microscopic training images from 11 categories. Our results show robust performance with an accuracy of 92%, and an F1 score of 93%. Additionally, the IOU distribution illustrates the high localization capability of the detector.

dataset, detector, microscopic image, (16 more...)

arXiv.org Artificial Intelligence

2208.01963

Country:

Asia > Malaysia (0.05)
Asia > Philippines > Visayas > Negros Island Region > Province of Negros Occidental > City of Bacolod (0.05)
Asia > Middle East > Republic of Türkiye (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.70)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback